Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Probability model-based algorithm for non-uniform data clustering
YANG Tianpeng, CHEN Lifei
Journal of Computer Applications    2018, 38 (10): 2844-2849.   DOI: 10.11772/j.issn.1001-9081.2018020375
Abstract653)      PDF (1008KB)(376)       Save
Aiming at the "uniform effect" of the traditional K-means algorithm, a new probability model-based algorithm was proposed for non-uniform data clustering. Firstly, a Gaussian mixture distribution model was proposed to describe the clusters hidden within non-uniform data, allowing the datasets to contain clusters with different densities and sizes at the same time. Secondly, the objective optimization function for non-uniform data clustering was deduced based on the model, and an EM (Expectation Maximization)-type clustering algorithm defined to optimize the objective function. Theoretical analysis shows that the new algorithm is able to perform soft subspace clustering on non-uniform data. Finally, experimental results on synthetic datasets and real datasets demostrate that the accuracy of the proposed algorithm is increased by 5% to 50% compared with the existing K-means-type algorithms and under-sampling algorithms.
Reference | Related Articles | Metrics
Soft subspace clustering algorithm for imbalanced data
CHENG Lingfang, YANG Tianpeng, CHEN Lifei
Journal of Computer Applications    2017, 37 (10): 2952-2957.   DOI: 10.11772/j.issn.1001-9081.2017.10.2952
Abstract521)      PDF (935KB)(672)       Save
Aiming at the problem that the current K-means-type soft-subspace algorithms cannot effectively cluster imbalanced data due to uniform effect, a new partition-based algorithm was proposed for soft subspace clustering on imbalanced data. First, a bi-weighting method was proposed, where each attribute was assigned a feature-weight and each cluster was assigned a cluster-weight to measure its importance for clustering. Second, in order to make a trade-off between attributes with different types or those categorical attributes having various numbers of categories, a new distance measurement was then proposed for mixed-type data. Third, an objective function was defined for the subspace clustering algorithm on imbalanced data based on the bi-weighting method, and the expressions for optimizing both the cluster-weights and feature-weights were derived. A series of experiments were conducted on some real-world data sets and the results demonstrated that the bi-weighting method used in the new algorithm can learn more accurate soft-subspace for the clusters hidden in the imbalanced data. Compared with the existing K-means-type soft-subspace clustering algorithms, the proposed algorithm yields higher clustering accuracy on imbalanced data, achieving about 50% improvements on the bioinformatic data used in the experiments.
Reference | Related Articles | Metrics